Search CORE

22 research outputs found

Machine Learning Approaches for Natural Resource Data

Author: Pohjankukka Jonne
Publication venue: fi=Turku Centre for Compuer Science|en=Turku Centre for Computer Science|
Publication date: 15/06/2018
Field of study

Abstract Real life applications involving efficient management of natural resources are dependent on accurate geographical information. This information is usually obtained by manual on-site data collection, via automatic remote sensing methods, or by the mixture of the two. Natural resource management, besides accurate data collection, also requires detailed analysis of this data, which in the era of data flood can be a cumbersome process. With the rising trend in both computational power and storage capacity, together with lowering hardware prices, data-driven decision analysis has an ever greater role. In this thesis, we examine the predictability of terrain trafficability conditions and forest attributes by using a machine learning approach with geographic information system data. Quantitative measures on the prediction performance of terrain conditions using natural resource data sets are given through five distinct research areas located around Finland. Furthermore, the estimation capability of key forest attributes is inspected with a multitude of modeling and feature selection techniques. The research results provide empirical evidence on whether the used natural resource data is sufficiently accurate enough for practical applications, or if further refinement on the data is needed. The results are important especially to forest industry since even slight improvements to the natural resource data sets utilized in practice can result in high saves in terms of operation time and costs. Model evaluation is also addressed in this thesis by proposing a novel method for estimating the prediction performance of spatial models. Classical model goodness of fit measures usually rely on the assumption of independently and identically distributed data samples, a characteristic which normally is not true in the case of spatial data sets. Spatio-temporal data sets contain an intrinsic property called spatial autocorrelation, which is partly responsible for breaking these assumptions. The proposed cross validation based evaluation method provides model performance estimation where optimistic bias due to spatial autocorrelation is decreased by partitioning the data sets in a suitable way. Keywords: Open natural resource data, machine learning, model evaluationTiivistelmä Käytännön sovellukset, joihin sisältyy luonnonvarojen hallintaa ovat riippuvaisia tarkasta paikkatietoaineistosta. Tämä paikkatietoaineisto kerätään usein manuaalisesti paikan päällä, automaattisilla kaukokartoitusmenetelmillä tai kahden edellisen yhdistelmällä. Luonnonvarojen hallinta vaatii tarkan aineiston keräämisen lisäksi myös sen yksityiskohtaisen analysoinnin, joka tietotulvan aikakautena voi olla vaativa prosessi. Nousevan laskentatehon, tallennustilan sekä alenevien laitteistohintojen myötä datapohjainen päätöksenteko on yhä suuremmassa roolissa. Tämä väitöskirja tutkii maaston kuljettavuuden ja metsäpiirteiden ennustettavuutta käyttäen koneoppimismenetelmiä paikkatietoaineistojen kanssa. Maaston kuljettavuuden ennustamista mitataan kvantitatiivisesti käyttäen kaukokartoitusaineistoa viideltä eri tutkimusalueelta ympäri Suomea. Tarkastelemme lisäksi tärkeimpien metsäpiirteiden ennustettavuutta monilla eri mallintamistekniikoilla ja piirteiden valinnalla. Väitöstyön tulokset tarjoavat empiiristä todistusaineistoa siitä, onko käytetty luonnonvaraaineisto riittävän laadukas käytettäväksi käytännön sovelluksissa vai ei. Tutkimustulokset ovat tärkeitä erityisesti metsäteollisuudelle, koska pienetkin parannukset luonnonvara-aineistoihin käytännön sovelluksissa voivat johtaa suuriin säästöihin niin operaatioiden ajankäyttöön kuin kuluihin. Tässä työssä otetaan kantaa myös mallin evaluointiin esittämällä uuden menetelmän spatiaalisten mallien ennustuskyvyn estimointiin. Klassiset mallinvalintakriteerit nojaavat yleensä riippumattomien ja identtisesti jakautuneiden datanäytteiden oletukseen, joka ei useimmiten pidä paikkaansa spatiaalisilla datajoukoilla. Spatio-temporaaliset datajoukot sisältävät luontaisen ominaisuuden, jota kutsutaan spatiaaliseksi autokorrelaatioksi. Tämä ominaisuus on osittain vastuussa näiden oletusten rikkomisesta. Esitetty ristiinvalidointiin perustuva evaluointimenetelmä tarjoaa mallin ennustuskyvyn mitan, missä spatiaalisen autokorrelaation vaikutusta vähennetään jakamalla datajoukot sopivalla tavalla. Avainsanat: Avoin luonnonvara-aineisto, koneoppiminen, mallin evaluoint

UTUPub

Bayesian Approach for Optimizing Forest Inventory Survey Sampling with Remote Sensing Data

Author: Heikkonen Jukka
Pohjankukka Jonne
Tuominen Sakari
Publication venue: 'MDPI AG'
Publication date: 13/12/2022
Field of study

In large-area forest inventories, a trade-off between the amount of data to be sampled and the corresponding collection costs is necessary. It is not always possible to have a very large data sample when dealing with sampling-based inventories. It is therefore important to optimize the sampling design with the limited resources. Whereas this sort of inventories are subject to these constraints, the availability of remote sensing (RS) data correlated with the forest inventory variables is usually much higher. For this reason, the RS and sampled field measurement data are often used in combination for improving the forest inventory estimation. In this study, we propose a model-based data sampling method founded on Bayesian optimization and machine learning algorithms which utilizes RS data to guide forest inventory sample selection. We evaluate our method in empirical experiments using real-world volume of growing stock data from the Aland region in Finland. The proposed method is compared against two baseline methods: simple random sampling and the local pivotal method. When a suitable model link is selected, the empirical experiments show on best case on average up to 22% and 79% improvement in population mean and variance estimation respectively over baselines. However, the results also illustrate the importance of model selection which has a clear effect on the results. The novelty of the study is in the application of Bayesian optimization in national forest inventory survey sampling

UTUPub

The spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers

Author: Antti Airola
Johanna Torppa
Jonne Pohjankukka
Jukka Heikkonen
Maarit Middleton
Tapio Pahikkala
Vesa Nykänen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Machine learning based classification methods are widely used in geoscience applications, including mineral prospectivity mapping. Typical characteristics of the data, such as small number of positive instances, imbalanced class distributions and lack of verified negative instances make ROC analysis and cross-validation natural choices for classifier evaluation. However, recent literature has identified two sources of bias, that can affect reliability of area under ROC curve estimation via cross-validation on spatial data. The pooling procedure performed by methods such as leave-one-out can introduce a substantial negative bias to results. At the same time, spatial dependencies leading to spatial autocorrelation can result in overoptimistic results, if not corrected for. In this work, we introduce the spatial leave-pair-out cross-validation method, that corrects for both of these biases simultaneously. The methodology is used to benchmark a number of classification methods on mineral prospectivity mapping data from the Central Lapland greenstone belt. The evaluation highlights the dangers of obtaining misleading results on spatial data and demonstrates how these problems can be avoided. Further, the results show the advantages of simple linear models for this classification task.</p

UTUPub

New computational methods for efficient utilisation of public data

Author: Ala-Ilomäki Jari
Cohen Juval
Heilimo Jyri
Hyvönen Eija
Hänninen Pekka
Ikonen Jaakko
Middleton Maarit
Nevalainen Paavo
Pahikkala Tapio
Pohjankukka Jonne
Pulliainen Jouni
Riihimäki Henri
Sutinen Raimo
Tuominen Sakari
Varjo Jari
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/2015
Field of study

201

Jukuri

FOTETRAF Advanced computational methodologies on open big data for forest terrain trafficability monitoring and forecasting

Author: Ala-Ilomäki Jari
Finér Leena
Heikkonen Jukka
Launiainen Samuli
Nevalainen Paavo
Pahikkala Tapio
Pohjankukka Jonne
Raduly-Baka Csaba
Salmivaara Aura
Tuominen Sakari
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/2016
Field of study

201

Jukuri

Radiomics and machine learning of multisequence multiparametric prostate MRI: Towards improved non-invasive prostate cancer characterization

Author: Aida Kiviniemi
Hannu J. Aronen
Harri Merisaari
Ileana Montoya Perez
Ivan Jambor
Jonne Pohjankukka
Jussi Toivonen
Marko Pesola
Parisa Movahedi
Pekka Taimen
Peter J. Boström
Tapio Pahikkala
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/10/2022
Field of study

Purpose To develop and validate a classifier system for prediction of prostate cancer (PCa) Gleason score (GS) using radiomics and texture features of T2-weighted imaging (T2w), diffusion weighted imaging (DWI) acquired using high b values, and T2-mapping (T2). Methods T2w, DWI (12 b values, 0–2000 s/mm2), and T2 data sets of 62 patients with histologically confirmed PCa were acquired at 3T using surface array coils. The DWI data sets were post-processed using monoexponential and kurtosis models, while T2w was standardized to a common scale. Local statistics and 8 different radiomics/texture descriptors were utilized at different configurations to extract a total of 7105 unique per-tumor features. Regularized logistic regression with implicit feature selection and leave pair out cross validation was used to discriminate tumors with 3+3 vs >3+3 GS. Results In total, 100 PCa lesions were analysed, of those 20 and 80 had GS of 3+3 and >3+3, respectively. The best model performance was obtained by selecting the top 1% features of T2w, ADCm and K with ROC AUC of 0.88 (95% CI of 0.82–0.95). Features from T2 mapping provided little added value. The most useful texture features were based on the gray-level co-occurrence matrix, Gabor transform, and Zernike moments. Conclusion Texture feature analysis of DWI, post-processed using monoexponential and kurtosis models, and T2w demonstrated good classification performance for GS of PCa. In multisequence setting, the optimal radiomics based texture extraction methods and parameters differed between different image types. </div

UTUPub

Turvemaiden digitaalinen kartoitus ja turvepeltolohkojen tunnistaminen

Author: Auri Jaakko
Heikkinen Jaakko
Kanaoja Tapio
Kekkonen Hanna
Kivilompolo Janne
Kivimäki Arttu
Laatikainen Matti
Lerssi Jouni
Madetoja Jaakko
Middleton Maarit
Myllys Merja
Mäkinen Ville
Möller Åke
Nousiainen Maarit
Oksanen Juha
Pitkänen Timo P
Pohjankukka Jonne
Puttonen Eetu
Räsänen Timo
Salmivaara Aura
Salo Tapio
Säävuori Heikki
Torppa Johanna
Väänänen Tapio
Publication venue: Luonnonvarakeskus
Publication date: 01/01/2023
Field of study

Maatalouden turvemaiden ilmasto- ja vesistöpäästöjen vähentäminen edellyttää turvepeltolohkojen tunnistamista, mutta maaperätieto ei ole ollut riittävän tarkkaa tähän tarkoitukseen. Raportissa esitellyn työn tavoitteena oli tuottaa tarkennettua paikkatietoa turvemaiden esiintymisestä ja paksuudesta turvepeltolohkojen tunnistamiseksi. Uusi paikkatietoaineisto turvemaiden esiintymisestä ja paksuudesta luotiin hyödyntämällä koneoppimismallinnusta. Mallinnus tehtiin Random Forest -menetelmällä. Turpeen esiintymistä selittäviksi aineistoiksi valmisteltiin 117 kpl koko maan kattavia satelliitti- ja lentoalustoilta mitattuja kaukokartoitusaineistoja ja geologista paikkatietoaineistoa. Koneoppimismallin opettamista ja testausta varten koottiin 3,5 miljoonaa maaperähavaintoa, josta 70 % käytettiin mallin opetukseen ja 30 % mallin riippumattomaan testaukseen. Mallinnuksessa ennustettiin turvepaksuusluokkien ≥ 10 cm, ≥ 30 cm, ≥ 40 cm ja > 60 cm esiintymistä 50 m × 50 m rasteriresoluutiossa ja ennusteet tuotettiin maankäyttömuodosta riippumatta kaikille maa-alueille. Malliennusteiden tarkkuus oli korkea. Turvepaksuusluokat pystyttiin erottelemaan muista maalajeista ja turvepaksuusluokista 89–96 % tarkkuudella. Tarkkuudet olivat korkeimmillaan ohuissa turvepaksuusluokissa ja hieman heikompia paksuissa luokissa. Maatalousmailla vähintään 30 cm paksun turvemaan alaksi arvoitiin 273 000 ha, mikä on noin 11 % maatalousmaa-alasta. Tästä pinta-alasta 73 % turvekerros oli > 60 cm. Saamamme arvio maatalousmaiden turvemaiden (≥ 30 cm) pinta-alasta on 8 600 ha suurempi kuin mitä mittakaavaltaan 1:200 000 maaperäkartasta voidaan arvioida. Peltolohkokohtainen tarkastelu osoitti, että turve-ennusteet mahdollistavat turvealan ja -paksuuden arvioimisen yksittäisillä peltolohkoilla. Esimerkiksi turvepeltolohkot, joilla on vähintään 50 % alastaan ≥30 cm paksu turvekerros, tunnistettiin yli 90 % tarkkuudella. Uusi paikkatietoaineisto Turpeen paksuus 1.0/2023 tarkentaa aikaisempaa tietoa turvemaiden esiintymisestä ja paksuudesta koko maassa. Aineiston luokittelutarkkuus ja alueellinen erottelukyky ovat olemassa olevia maaperäkartta-aineistoja parempia ja sen avulla tunnistetaan aikaisemmin kartoittamattomia turvemaita. Yleistarkkuusmetriikat raportoidaan jokaiselle luokittelulle erikseen ja epävarmuuksien hajautuminen on esitetty Random Forest -puiden yksimielisyyden avulla rasterisolukohtaisesti. Uudet turve-ennusteet tuovat uusia mahdollisuuksia maaperään ja maankäyttöön liittyvien toimintojen suunnittelun, ohjaukseen ja vaikutusten arviointiin, sekä tutkimukseen

Jukuri

Bayesian Approach for Optimizing Forest Inventory Survey Sampling with Remote Sensing Data

Author: Heikkonen Jukka
Pohjankukka Jonne
Tuominen Sakari
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Jukuri

42th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2016

Author: Jonne Pohjankukka
Thomas Xu
Ville Leppänen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2022
Field of study

In this paper, we investigate the traffic characteristics of parallel and high performance computing applications. Parallel applications that utilize multiple processing cores are widespread nowadays due to the trend of multicore processors. However the design paradigm of traditional sequential execution and concurrent execution can vary significantly. Therefore the estimation and prediction approaches used in conventional software can be limited for parallel applications. The communication among different nodes in a multicore system should be analysed and categorized in order to improve the accuracy of system simulation. We study several parallel applications running on a full system simulation environment. The communication traces among different nodes are collected and analysed. We discuss the detailed characteristics of these applications. The applications are grouped into different categories depending on several parallel programming paradigms. We apply power-law model with maximum likelihood estimation, Gaussian mixture model, as well as the polynomial model for fitting the trace data. A generic synthetic traffic model is proposed based on the results. Experiments show the proposed model can be used to evaluate the performance of parallel systems more accurately than by other synthetic traffic models.</p

UTUPub